2. Tests for Stationarity

How do you check if a time seirs is stationary?

  • Visual inspection for constant mean, constant variance

  • KPSS test (\(H_0\): Stationary)

  • ADF test (\(H_0\): Not Stationary)

  • PP test (\(H_0\): Not Stationary)

  • Fit AR(1), and see if \(\hat \phi_1\) is significantly different from 1.



KPSS test

Kwiatkowski-Phillips-Schmidt-Shin (1992) test for

  • Default choice in auto.arima() \[ \left\{ \begin{array}{ll} H_0: X_t \mbox{ is trend stationary}\\ H_A: X_t \mbox{ is not stationary} \\ \end{array} \right. \]

  • Large p-value means “Stationary”.

  • Decompose the series as \[ X_t \hspace{3mm} = \hspace{3mm} T_t + W_t + Y_t \] where \(T_t\) is deterministic trend, \(W_t\) is random walk, and \(Y_t\) is stationary error.

  • Lagrange multiplier test that the random walk has zero variance.


ADF test

Augumented Dickey-Fuller test (Brockwell p. 194)

  • “Unit-root” test

  • The stationarity hypothesis \[ \left\{ \begin{array}{ll} H_0: Y_t \mbox{ is not stationary} \\ H_A: Y_t \mbox{ is stationary} \end{array} \right. \]

  • Is replaced by \[ \left\{ \begin{array}{ll} H_0: Y_t \mbox{ has unit root} \\ H_A: Y_t \mbox{ does not have unit root } \end{array} \right. \]

  • Small p-value rejects the null hypothesis of non-stationarity, and means “Stationary”.

  • Fit AR(1) to a time series, and test if \(\phi_1\) is significantly different from 1, by \[ \frac{ \hat \phi_1 -1 }{ SE(\hat \phi_1) } \sim N(0,1) \]


PP test

Phillips-Perron test for the null hypothesis that x has a unit root.

  • Same Hypothesis as ADF test

  • Small p-value means “Stationary”.

  • The tests are similar to ADF tests, but they incorporate an automatic correction to the DF procedure to all ow for autocorrelated residuals.

  • Calculation of the test statistics is complex.

  • The tests usually give the same conclusions as the ADF tests



P-value of (KPSS, ADF, PP) tests

* (LARGE, small, small) \hspace{3mm} $\to$ All three indicating stationarity.

* (small, LARGE, LARGE) \hspace{3mm} $\to$ All three indicating non-stationarity. 

* (anything else) \hspace{3mm} $\to$ Conflicing conclusions, inconclusive. 

  • Stationarity.tests() performs all of the three tests.



Ex: Use it on above datasets

## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
  • Large, small, small \(\to\) stationary. As it should be.



## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
  • small, Large, small \(\to\) conflicting. First two is indicating Non-Stationarity, last one is indicating stationarity.



## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS   ADF   PP
## p-val: 0.01 0.706 0.01
  • small, Large, Large \(\to\) All three indicating non-stationarity.



Ex: Australian Steel Data

## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS   ADF   PP
## p-val: 0.01 0.408 0.01

## Warning in pp.test(A): p-value smaller than printed p-value

## Warning in pp.test(A): p-value smaller than printed p-value
##        KPSS   ADF   PP
## p-val: 0.01 0.049 0.01
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS  ADF   PP
## p-val: 0.01 0.01 0.01
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS   ADF   PP
## p-val:  0.1 0.046 0.01



3. Modeling Lake HURON data 4 different ways



a) Direct ARMA fit

If we treat it as “stationary” series, we can try to fit ARMA(p,q) series.

## Warning in kpss.test(A): p-value smaller than printed p-value
##        KPSS  ADF    PP
## p-val: 0.01 0.24 0.025
## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 estimated as 0.4784:  log likelihood=-101.09
## AIC=210.18   AICc=210.62   BIC=220.48

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20   JB    SD
## [1,] 0.963 0.952 0.934 0.567 0.641 0.89 0.684

Analysis 1 (direct fit)

  • auto.arima() chooses AR(2) with min AICc.

  • AR(2) with constant mean was fit directly to data. With observation \(Y_t\),

\[ {\large Y_t \hspace{3mm} = \hspace{3mm} \mu + X_t } \\ \] \[ {\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + \phi_2 X_{t-2} + e_t } \\ \] \[ {\large e_t \sim WN(0, \sigma^2) } \]

## Series: Lake 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.7665  0.3393  9.1290
## s.e.  0.0773  0.1123  0.3861
## 
## sigma^2 estimated as 0.4784:  log likelihood=-101.09
## AIC=210.18   AICc=210.62   BIC=220.48







b) Forcing Linear Trend

If we choose that the level is going down linearly, we can try to fit a line with ARMA errors.

## 
## Call:
## lm(formula = Lake ~ time(Lake))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50919 -0.74760 -0.01556  0.75966  2.53409 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 55.278141   7.922614   6.977 4.02e-10 ***
## time(Lake)  -0.024071   0.004119  -5.843 7.16e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.136 on 95 degrees of freedom
## Multiple R-squared:  0.2644, Adjusted R-squared:  0.2566 
## F-statistic: 34.14 on 1 and 95 DF,  p-value: 7.165e-08

## Series: Reg2$residuals 
## ARIMA(1,0,1) with zero mean 
## 
## Coefficients:
##          ar1     ma1
##       0.6671  0.3827
## s.e.  0.0937  0.1135
## 
## sigma^2 estimated as 0.452:  log likelihood=-98.72
## AIC=203.44   AICc=203.7   BIC=211.16

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25 ML15  ML20    JB    SD
## [1,] 0.985 0.974 0.969 0.19 0.189 0.782 0.669

Analysis 2 (linear trend)

  • Linear trend was fit to \(Y_t\), then the residuals were fitted with ARMA(1,1).

  • We observe $ Y_t = a+b t + X_t $ \(X_t\) is ARMA

  • In other words,

\[{\large Y_t \hspace{3mm} = \hspace{3mm} a + bt + X_t }\]

\[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} }\]

\[{\large e_t \sim WN(0, \sigma^2) }\]

## 
## Call:
## lm(formula = Lake ~ time(Lake))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50919 -0.74760 -0.01556  0.75966  2.53409 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 55.278141   7.922614   6.977 4.02e-10 ***
## time(Lake)  -0.024071   0.004119  -5.843 7.16e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.136 on 95 degrees of freedom
## Multiple R-squared:  0.2644, Adjusted R-squared:  0.2566 
## F-statistic: 34.14 on 1 and 95 DF,  p-value: 7.165e-08
## Series: Reg2$residuals 
## ARIMA(1,0,1) with zero mean 
## 
## Coefficients:
##          ar1     ma1
##       0.6671  0.3827
## s.e.  0.0937  0.1135
## 
## sigma^2 estimated as 0.452:  log likelihood=-98.72
## AIC=203.44   AICc=203.7   BIC=211.16







c) Box-Cox Differencing (ARIMA modeling)

Let’s look at the difference between the observations: \[ diff(Y) = Y_t - Y_{t-1} \\ \\ \]

## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
##        KPSS  ADF   PP
## p-val:  0.1 0.01 0.01
## Series: diff.Y 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 estimated as 0.4812:  log likelihood=-99.88
## AIC=207.76   AICc=208.2   BIC=218.02

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.981 0.958 0.946 0.551 0.545 0.575 0.681
## Series: Lake 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 estimated as 0.4812:  log likelihood=-99.88
## AIC=207.76   AICc=208.2   BIC=218.02

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.979 0.956 0.942 0.524 0.495 0.577 0.677

Analysis 3 (take difference)

  • Difference of \(Y_t\) was taken. The difference seems to be ARMA(1,2).

  • In other words, $ Y_t $ is ARIMA(1,1,2), meaning \[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} \phi_1 X_{t-1} + e_t + \theta_1 e_{t-1} + \theta_2 e_{t-2} }\] \[{\large e_t \sim WN(0,\sigma^2) }\]

## Series: Lake 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.6385  -0.5349  -0.3514
## s.e.  0.1345   0.1445   0.1055
## 
## sigma^2 estimated as 0.4812:  log likelihood=-99.88
## AIC=207.76   AICc=208.2   BIC=218.02







c) Box-Cox Differencing (ARIMA modeling)

Let’s look at the difference between the observations: \[ diff(Y) = Y_t - Y_{t-1} \\ \\ \]

##   B-L test H0: the sereis is uncorrelated
##   M-L test H0: the square of the sereis is uncorrelated
##   J-B test H0: the sereis came from Normal distribution
##   SD         : Standard Deviation of the series
##       BL15  BL20  BL25  ML15  ML20    JB    SD
## [1,] 0.176 0.175 0.132 0.172 0.117 0.384 0.737
## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.5383:  log likelihood=-106.49
## AIC=214.98   AICc=215.02   BIC=217.54

Analysis 4 (take difference)

  • Difference of \(Y_t\) was taken. The difference seems to be WN.

  • In other words, $ Y_t $ is ARIMA(0,1,0), meaning \[{\large \bigtriangledown Y_t = Y_t - Y_{t-1} = X_t }\] \[{\large \hspace{20mm} X_t \hspace{3mm} = \hspace{3mm} e_t }\] \[{\large e_t \sim WN(0,\sigma^2) }\]

## Series: Lake 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.5383:  log likelihood=-106.49
## AIC=214.98   AICc=215.02   BIC=217.54





Summary

  • Three popular test for stationarity is KPSS, ADF, and PP test.

  • Tests are availabe in R, can be implemented with below command.

  • Classical method tries to identify trend, and tries to removeit by polynomial regression.

  • Box-Jenkins Method tries to take a difference between today’s data and yesterday’s by applying \(\bigtriangledown=(1-B)\) operator.

  • applying B-J method will often times stationarize non-stationary time series.

  • We can use ARMA(\(p,q\)) model to model the stationarized version of the series.

  • If you use \(\bigtriangledown\) once, and apply ARMA(1,2), then it is called ARIMA (1,1,2) model.






TS Class Web PageR resource page